Image Alt Text

Introduction

Normalized data organizes data focused on reduce the redundancy and ensure data integrity. During normalization, the structure minimizes duplicate information to avoid data redundancy and inconsistencies. Removing redundancy and inconsistencies improves data integrity, reduces storage, and simplifies database management. It also allows for efficient data retrieval and manipulation through querying and reporting.

Normalization obeys a group of rules called normal forms. Normal forms of data are: 1) reduces data redundancy and 2) improves data integrity. Therefore, a normalized table:

  1. Does not have Data Redundancy: data is not duplicated or repeated.

  2. Well-Defined Primary Key: the primary key uniquely identifies each row in the table. The primary key is as a reference for relationships with other tables.

  3. Atomic Values: each column in a normalized table contains atomic (indivisible) values. Each column holds only one value, and there are no repeating groups or arrays within a column.

  4. No Partial Dependencies: each non-primary key column is dependent on the primary key. In other words, all the columns in the table correlate to the primary key as a whole and not just part of it.

  5. No Transitive Dependencies: does not have a column that depends on another column linked to a third column. Column’s relationship must be direct with the primary key and not mediated through other columns.

The following project has as its main goal design a normalized SQL schema from a external database in R.

Covering All Bases: Essential General Information about the Data

We will work with data on Major League Baseball game statistics, compiled by Retrosheet. This dataset contains data from the year range of 1800s untill 2000s.

The defensive positions identification code are :
1. Pitcher
2. Catcher
3. First baseman
4. Second baseman
5. Third baseman
6. Shortstop
7. Left fielder
8. Center fielder
9. Right fielder

We gonna work with game_log.csv as our main table and other three auxiliaries tables:
* park_codes.csv
* person_codes.csv
* team_codes.csv

And file game_log_fields.txt, where we can find detailed information about game_log.csv.

Setting the Stage for Analyzing Baseball Data!

#Running packages
packages <- c('tidyverse',
             'ggplot2',
             'data.table',
             'RSQLite',
             'DBI',
             'kableExtra')

options(rgl.debug = TRUE)

if(sum(as.numeric(!packages %in% installed.packages())) != 0){
  instalador <- packages[!packages %in% installed.packages()]
  for(i in 1:length(instalador)) {
    install.packages(instalador, dependencies = T)
    break()}
  sapply(packages, require, character = T) 
} else {
  sapply(packages, require, character = T) 
}
##  tidyverse    ggplot2 data.table    RSQLite        DBI kableExtra 
##       TRUE       TRUE       TRUE       TRUE       TRUE       TRUE
library(DBI)
# Loading tables just for overview
tables_to_be_load <- c('game_log.csv', 'park_codes.csv', 'person_codes.csv', 'team_codes.csv')
sapply(tables_to_be_load, read.csv)


The next step is to read and store the files at objects. For ´game_log.csv´ is needed to change and specify column types for the data using the col_types argument. In this case it will be needed for all columns be set to “c”, which represents character/string type.

park_codes <-  read.csv('park_codes.csv')
person_codes <- read.csv('person_codes.csv')
team_codes <- read.csv('team_codes.csv')
game_log <- read_csv("game_log.csv", 
                col_types = cols(.default = "c",
                                 v_league = "c", h_league = "c",
                                 `3b_umpire_id` = "c", `3b_umpire_name` = "c",
                                 `2b_umpire_id` = "c", `2b_umpire_name` = "c",
                                 `lf_umpire_id` = "c", `lf_umpire_name` = "c",
                                 `rf_umpire_id` = "c", `rf_umpire_name` = "c",
                                 completion = "c", winning_rbi_batter_id = "c",
                                 winning_rbi_batter_id_name = "c", protest = "c",
                                 v_first_catcher_interference = "c", 
                                 h_first_catcher_interference = "c"))#Specifies column type argument for each listed column

# For import data is also needed have a version of game_log with the original vars

game_log_original <- read.csv("game_log.csv")

Scouting the Field: An Overview of game_log.csv

# Dataset dimension
dim(game_log)
## [1] 171907    161
# Creating table with column and var type
column_indices1 <- c()

# Loop through columns of the "park_codes" data frame
for (col in colnames(game_log)) {
  column_index <- match(col, colnames(game_log))# Find the index of the column
  cat("Index of column '", col, "' is: ", column_index, "\n") # Print the column index
  column_indices1[[col]] <- column_index # Append the column index to the list
}

# Create a data frame with column names and their corresponding indices
column_info <- data.frame(Column_Name = colnames(game_log),
                          Column_Index = unlist(column_indices1),
                          Variable_Class = sapply(game_log, class),
                          row.names = NULL)

# Display the column info using a kable table
kable(column_info, caption = "Column Information Table") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
# How are defensive positions coded?
unique(game_log$h_player_1_def_pos)
##  [1] "5"  "3"  "9"  "2"  "4"  "6"  "7"  "8"  NA   "1"  "10"


The game_log is a very robust dataset with 161 variables and 171,907 observations with details about game’s date, number of games, place, team, score, etc. The defensive positions are coded from 1-10. Yet, is not clear what means 10. All stored variables are in character format and contain at least one NA input.


Also at this stage is not possible identify our primary key for this dataset.

Scouting the Field: An Overview of park_codes.csv

# Dataset Dimension
dim(park_codes)
## [1] 252   9
# Create an empty vector to store column indices
column_indices <- c()

# Loop through columns of the "park_codes" data frame
for (col in colnames(park_codes)) {
  column_index2 <- match(col, colnames(park_codes))# Find the index of the column
  cat("Index of column '", col, "' is: ", column_index2, "\n") # Print the column index
  column_indices[[col]] <- column_index2 # Append the column index to the list
}
## Index of column ' park_id ' is:  1 
## Index of column ' name ' is:  2 
## Index of column ' aka ' is:  3 
## Index of column ' city ' is:  4 
## Index of column ' state ' is:  5 
## Index of column ' start ' is:  6 
## Index of column ' end ' is:  7 
## Index of column ' league ' is:  8 
## Index of column ' notes ' is:  9
# Create a data frame with column names and their corresponding indices
column_info2 <- data.frame(Column_Name = colnames(park_codes),
                          Column_Index = unlist(column_indices),
                          Variable_Class = sapply(park_codes, class),
                          row.names = NULL)

# Display the column info using a kable table
kable(column_info2, caption = "Column Information Table") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Column Information Table
Column_Name Column_Index Variable_Class
park_id 1 character
name 2 character
aka 3 character
city 4 character
state 5 character
start 6 character
end 7 character
league 8 character
notes 9 character
# data sample
head(park_codes)
##   park_id                          name
## 1   ALB01                Riverside Park
## 2   ALT01                 Columbia Park
## 3   ANA01      Angel Stadium of Anaheim
## 4   ARL01             Arlington Stadium
## 5   ARL02 Rangers Ballpark in Arlington
## 6   ATL01 Atlanta-Fulton County Stadium
##                                        aka      city state      start
## 1                                             Albany    NY 09/11/1880
## 2                                            Altoona    PA 04/30/1884
## 3            Edison Field; Anaheim Stadium   Anaheim    CA 04/19/1966
## 4                                          Arlington    TX 04/21/1972
## 5 The Ballpark in Arlington; Ameriquest Fl Arlington    TX 04/11/1994
## 6                                            Atlanta    GA 04/12/1966
##          end league                                          notes
## 1 05/30/1882     NL TRN:9/11/80;6/15&9/10/1881;5/16-5/18&5/30/1882
## 2 05/31/1884     UA                                               
## 3                AL                                               
## 4 10/03/1993     AL                                               
## 5                AL                                               
## 6 09/23/1996     NL

Scouting the Field: An Overview of ‘person_codes’

# Dataset Dimension
dim(person_codes)
## [1] 20494     7
# Create an empty vector to store column indices
column_indices3 <- c()

# Loop through columns of the "park_codes" data frame
for (col in colnames(person_codes)) {
  column_index3 <- match(col, colnames(person_codes))# Find the index of the column
  cat("Index of column '", col, "' is: ", column_index3, "\n") # Print the column index
  column_indices3[[col]] <- column_index3 # Append the column index to the list
}
## Index of column ' id ' is:  1 
## Index of column ' last ' is:  2 
## Index of column ' first ' is:  3 
## Index of column ' player_debut ' is:  4 
## Index of column ' mgr_debut ' is:  5 
## Index of column ' coach_debut ' is:  6 
## Index of column ' ump_debut ' is:  7
# Create a data frame with column names and their corresponding indices
column_info3 <- data.frame(Column_Name = colnames(person_codes),
                          Column_Index = unlist(column_indices3),
                          Variable_Class = sapply(person_codes, class),
                          row.names = NULL)

# Display the column info using a kable table
kable(column_info3, caption = "Column Information Table") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Column Information Table
Column_Name Column_Index Variable_Class
id 1 character
last 2 character
first 3 character
player_debut 4 character
mgr_debut 5 character
coach_debut 6 character
ump_debut 7 character
# data sample
head(person_codes)
##         id    last    first player_debut mgr_debut coach_debut ump_debut
## 1 aardd001 Aardsma    David   04/06/2004                                
## 2 aaroh101   Aaron     Hank   04/13/1954                                
## 3 aarot101   Aaron   Tommie   04/10/1962            04/06/1979          
## 4 aased001    Aase      Don   07/26/1977                                
## 5 abada001    Abad     Andy   09/10/2001                                
## 6 abadf001    Abad Fernando   07/28/2010


This file contains 20494 rows and 7 columns. With all 7 variables listed as character type. The table “person_codes” tells you details about each player. The primary key of this table is id.
The ‘id’ is connected to the game_log table with variables that use player’s id as input.

Scouting the Field: An Overview of ‘team_codes’

# Dataset Dimension
dim(team_codes)
## [1] 150   8
# Create an empty vector to store column indices
column_indices4 <- c()

# Loop through columns of the "park_codes" data frame
for (col in colnames(team_codes)) {
  column_index4 <- match(col, colnames(team_codes))# Find the index of the column
  cat("Index of column '", col, "' is: ", column_index4, "\n") # Print the column index
  column_indices4[[col]] <- column_index4 # Append the column index to the list
}
## Index of column ' team_id ' is:  1 
## Index of column ' league ' is:  2 
## Index of column ' start ' is:  3 
## Index of column ' end ' is:  4 
## Index of column ' city ' is:  5 
## Index of column ' nickname ' is:  6 
## Index of column ' franch_id ' is:  7 
## Index of column ' seq ' is:  8
# Create a data frame with column names and their corresponding indices
column_info4 <- data.frame(Column_Name = colnames(team_codes),
                          Column_Index = unlist(column_indices4),
                          Variable_Class = sapply(team_codes, class),
                          row.names = NULL)

# Display the column info using a kable table
kable(column_info4, caption = "Column Information Table") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Column Information Table
Column_Name Column_Index Variable_Class
team_id 1 character
league 2 character
start 3 integer
end 4 integer
city 5 character
nickname 6 character
franch_id 7 character
seq 8 integer

The team_codes table provides information about each team. Its primary key is team_id. This table is connected to the park_codes and game_log tables with “team_id” variable. The foreign key in game_log tables are v_name, v_league, h_name, h_league. This file contains 150 rows and 8 columns. With variable 8 listed as integer type.

Going Beyond the Box Score: Exploring Additional Information


game_log : has a record of over 170,000 games, chronologically ordered, that occur between 1871 and 2016.
For each game we have: * general information on the game * team level stats for each team * a list of players from each team, numbered, with their defensive positions * the umpires that officiated the game * some ‘awards’, like winning and losing pitcher


person_code : is a list of people related to the team with IDs. The IDs look like a match to those used in the game_log. There are debut dates for: players, managers, coaches and umpires. With some people been one or more of these roles, or more than one person with same name (not sure yet).


Coaches and managers are two different things in baseball. And seems like coaches aren’t recorded in the game log.
park_codes : list of all baseball parks with IDs which match with the game_log, as well as names, nicknames, city and league.
team_codes : list of all teams, with team_ids which seem to match the game_log.

For additional references at the game rules see the sites:

  1. http://probaseballinsider.com/baseball-instruction/baseball-basics/baseball-basics-positions/
  2. https://en.wikipedia.org/wiki/Baseball_positions#:~:text=Each%20position%20conventionally%20has%20an,and%209%20(right%20fielder).

Importing Data into SQLite

 dataset_creation <- "C:/Users/lovis/OneDrive/Documents/MBA/New folder/SQL_PROJ/SQL_proj_3_Baseball/new_Baseball.db"

# Connect to SQLite
conn <- dbConnect(SQLite(), "new_Baseball.db", create = TRUE)
        
# Write table for each dataset
        dbWriteTable(conn = conn, name = "game_log", 
              value = game_log_original, row.names = FALSE, header = TRUE)
        
        dbWriteTable(conn = conn, name = "person_codes", 
            value = person_codes, row.names = FALSE, header = TRUE)
        
        dbWriteTable(conn = conn, name = "team_codes", 
             value = team_codes, row.names = FALSE, header = TRUE)
        
        dbWriteTable(conn = conn, name = "park_codes", 
             value = park_codes, row.names = FALSE, header = TRUE)


# List of tables that were connected
dbListTables(conn)
## [1] "game_log"     "park_codes"   "person_codes" "team_codes"
dbDisconnect(conn)
#### It works ❣

Swing for Success: Data Normalization


Within a normalized table, every column should be related, or be an attribute, to the primary key. Any columns that are not an attribute to the primary key are better placed in their own tables.
In a normalized table, non-primary key columns should be attributes of the primary key. We also want to eliminate repetition of data across tables. So, any redundant data is available elsewhere.

Step 1. Correlating Key Metrics through Data Tables

game_log and park_code share the same data for park_id, game_log and team_codes share team_id that is the same as v_code game_log column data is related to :
person_codes by column debut_dates
team_codes by columns start, end and sequence
park_codes by columns start and end years
In game_log, all offensive and defensive stats are repeated for the home team and the visiting team. * At game_log, we have a listing for 9 players on each team with their positions. Those inputs can generate a new table that tracks player appearances and positions. * We have several awards in game_log like winning pitcher and losing pitcher.


However, while all other tables have a unique id, the there is no unique id for each game in the game_log table. It will be needed to create an primary key for this table with the unique ID’s.


According to the Retrosheet guidelines the requirements to write the game_id are:

  *Game ids must have 12 characters
  *First three: home team name (format: ABC) 
  * Next eight: year,month, day ( format: yyyymmdd) 
  * Last digit: + 0 if single game, + 1 if first game, + 2 if second game.

So, the game id will be a string concatenation, that has a unique ID, of h_name, number_of_game.

#Creating a new column to the table with no data
conn <- dbConnect(SQLite(), dbname = "new_Baseball.db")

c_added <- 'ALTER TABLE game_log
            ADD COLUMN game_id CHARACTER;'
            
dbExecute(conn, c_added)
## [1] 0
# game_id creation
concatenate_str <- 'UPDATE game_log
                    SET game_id = date || h_name || number_of_game
                    WHERE game_id IS NULL '
dbExecute(conn,concatenate_str)
## [1] 171907
#Just for be sure lets check the query

qury <- 'SELECT game_id,
      date,
      h_name,
      number_of_game
  FROM game_log
  LIMIT 5;'

dbGetQuery(conn,qury)
##        game_id     date h_name number_of_game
## 1 18710504FW10 18710504    FW1              0
## 2 18710505WS30 18710505    WS3              0
## 3 18710506RC10 18710506    RC1              0
## 4 18710508CH10 18710508    CH1              0
## 5 18710509TRO0 18710509    TRO              0
#### It works ❣

A QUICK REMINDER TO MY FUTURE SELF

Dear me, the {r adding_col, engine = ‘sql’} is a code chunk header syntax used in R Markdown, which specifies the language (in this case, R) and the engine (in this case, ‘sql’) for executing the code chunk. However, R Markdown does not support ‘sql’ as an engine natively. The supported engines in R Markdown are typically R, Python, Bash, and other programming languages. If you specify ‘sql’ as the engine, it may cause an error as it is not recognized.

Step 2. Design a Normalized schema

Each table should contain columns that are only relevant to the primary key. So, lets categorize columns and split it creating separate tables for each category. The image bellow is the initial schema for data normalization:

Image Alt Text

Creating the table player_info

# Creates the table

player_table <- 'CREATE TABLE IF NOT EXISTS player (player_id TEXT PRIMARY KEY,
                                      first_name TEXT,
                                      last_name TEXT);'
dbExecute(conn, player_table)
## [1] 0
# Inserts values from person_codes
insert <- 'INSERT OR IGNORE INTO player
  SELECT first AS first_name,
         last AS last_name,
         id AS player_id
  FROM person_codes;'
dbExecute(conn, insert)
## [1] 2485
# View sample
view_player <- 'SELECT * FROM player LIMIT 5;'
dbGetQuery(conn, view_player)
##   player_id first_name last_name
## 1     David    Aardsma  aardd001
## 2      Hank      Aaron  aaroh101
## 3    Tommie      Aaron  aarot101
## 4       Don       Aase  aased001
## 5      Andy       Abad  abada001

Creating the table park_info

#Creates table
park_info <- 'CREATE TABLE IF NOT EXISTS park (
                                                park_id TEXT PRIMARY KEY,
                                                name TEXT,
                                                nickname TEXT,
                                                city TEXT,
                                                state TEXT,
                                                notes TEXT);'
dbExecute(conn, park_info)
## [1] 0
# Add info into the table
insert_1 <- 'INSERT OR IGNORE INTO park
              SELECT park_id,
                     name,
                     aka AS nickname,
                     city,
                     state,
                     notes
              FROM park_codes;'

dbExecute(conn, insert_1)
## [1] 252
# View sample
view_park <- 'SELECT * FROM park LIMIT 5;'
dbGetQuery(conn, view_park)
##   park_id                          name
## 1   ALB01                Riverside Park
## 2   ALT01                 Columbia Park
## 3   ANA01      Angel Stadium of Anaheim
## 4   ARL01             Arlington Stadium
## 5   ARL02 Rangers Ballpark in Arlington
##                                   nickname      city state
## 1                                             Albany    NY
## 2                                            Altoona    PA
## 3            Edison Field; Anaheim Stadium   Anaheim    CA
## 4                                          Arlington    TX
## 5 The Ballpark in Arlington; Ameriquest Fl Arlington    TX
##                                            notes
## 1 TRN:9/11/80;6/15&9/10/1881;5/16-5/18&5/30/1882
## 2                                               
## 3                                               
## 4                                               
## 5

Creating the table league

This table comprises the league’s name and its abbreviation (league_id). Is possible to get the leagues_id extracting the unique values from the column ‘leagues’ at team_codes table. And the league name can be crosscheck on web.

unique(team_codes[2])
##    league
## 1      UA
## 2      NL
## 4      PL
## 5        
## 6      AA
## 9      AL
## 11     FL
#Creates table
league_info <- 'CREATE TABLE IF NOT EXISTS league (
                                                league_id TEXT PRIMARY KEY,
                                                name TEXT);'
dbExecute(conn, league_info)
## [1] 0
# Add info into the table
insert_2 <- 'INSERT OR IGNORE INTO league
              VALUES
                  ("NL", "National League"),
                  ("AL", "American League"),
                  ("AA", "American Association"),
                  ("FL", "Federal League"),
                  ("PL", "Players League"),
                  ("UA", "Union Assocation");'

dbExecute(conn, insert_2)
## [1] 6
# View sample
view_league <- 'SELECT * FROM league LIMIT 5;'
dbGetQuery(conn, view_league)
##   league_id                 name
## 1        NL      National League
## 2        AL      American League
## 3        AA American Association
## 4        FL       Federal League
## 5        PL       Players League

Creating the table appearance_type

# Creates table
appearance <- 'CREATE TABLE IF NOT EXISTS appearance_type (
appearance_type_id TEXT PRIMARY KEY,
name TEXT,
category TEXT
);'
dbExecute(conn, appearance)
## [1] 0
# Importing data from external file
appearance_type_original <- read.csv('appearance_type.csv')  # Load the file

dbWriteTable(conn = conn, name = 'appearance_type_original', value = appearance_type_original, header = TRUE)

insert_3 <- 'INSERT OR IGNORE INTO appearance_type
              SELECT *
              FROM appearance_type_original;'
dbExecute(conn, insert_3)
## [1] 31
# View sample
view_appearance_type <- 'SELECT * FROM appearance_type LIMIT 5;'

dbGetQuery(conn, view_appearance_type)
##   appearance_type_id     name category
## 1                 O1 Batter 1  offense
## 2                 O2 Batter 2  offense
## 3                 O3 Batter 3  offense
## 4                 O4 Batter 4  offense
## 5                 O5 Batter 5  offense

Creating the table team

# Creates table
team <- 'CREATE TABLE IF NOT EXISTS team (
team_id TEXT PRIMARY KEY,
league_id TEXT,
city TEXT,
nickname TEXT,
franch_id,
FOREIGN KEY (league_id) REFERENCES league(league_id)
);'
dbExecute(conn, team)
## [1] 0
# Add info into the table

insert_4 <- 'INSERT OR IGNORE INTO team
              SELECT team_id,
league,
city,
nickname,
franch_id
              FROM team_codes;'
dbExecute(conn, insert_4)
## [1] 150
# View sample
view_team <- 'SELECT * FROM team LIMIT 5;'

dbGetQuery(conn, view_team)
##   team_id league_id      city        nickname franch_id
## 1     ALT        UA   Altoona Mountain Cities       ALT
## 2     ARI        NL   Arizona    Diamondbacks       ARI
## 3     BFN        NL   Buffalo          Bisons       BFN
## 4     BFP        PL   Buffalo          Bisons       BFP
## 5     BL1           Baltimore        Canaries       BL1

Modifying the table game_log to create game table

# Add a new column called day
gamelog_mutate <- 'ALTER TABLE game_log
ADD COLUMN day TEXT;
'

dbExecute(conn, gamelog_mutate)

# Add info into day column

insert_5 <- 'INSERT OR IGNORE INTO game_log (day)
              SELECT 
              CASE WHEN day_night = "D" THEN "TRUE" ELSE "FALSE" END
              FROM game_log;'
dbExecute(conn, insert_5)

# Creates table game
game <- 'CREATE TABLE IF NOT EXISTS game (
game_id TEXT PRIMARY KEY,
date INTEGER,
day TEXT,
number_of_game INTEGER,
park_id TEXT,
length_outs INTEGER,
length_minutes INTEGER,
completion TEXT,
forefeit TEXT,
protest TEXT,
attendance INTEGER,
additional_info TEXT,
acquisition_info TEXT,
FOREIGN KEY (park_id) REFERENCES park(park_id)
);'
dbExecute(conn, game)

# Add info into the table

insert_6 <- 'INSERT OR IGNORE INTO game
              SELECT game_id, 
date, day, 
number_of_game,
park_id, 
length_outs, 
length_minutes, 
completion, 
forfeit,
protest,
attendance,
additional_info,
acquisition_info
FROM game_log;'
dbExecute(conn, insert_6)
# View sample
view_game <- 'SELECT * FROM game LIMIT 5;'

dbGetQuery(conn, view_game)

Creating the table team_stats

# Creates table
team_stat <- 'CREATE TABLE IF NOT EXISTS team_stats (
team_id TEXT,
game_id TEXT,
home TEXT,
league_id TEXT,
score INTEGER,
line_score INTEGER,
at_bats INTEGER,
hits INTEGER,
doubles INTEGER,
triples INTEGER,
homeruns INTEGER,
rbi INTEGER,
sacrifice_hits INTEGER,
sacrifice_flies INTEGER,
hit_by_pitch INTEGER,
walks INTEGER,
intentional_walks INTEGER,
strikeouts INTEGER,
stolen_bases INTEGER,
caught_stealing INTEGER,
grounded_into_double INTEGER,
first_catcher_interference INTEGER,
left_on_base INTEGER,
pitchers_used INTEGER,
individual_earned_runs INTEGER,
team_earned_runs INTEGER,
wild_pitches INTEGER,
balks INTEGER,
putouts INTEGER,
assists INTEGER,
errors INTEGER,
passed_balls INTEGER,
double_playes INTEGER,
triple_plays INTEGER);'
dbExecute(conn, team_stat)
## [1] 0
# Add info into the table

insert_7 <- 'INSERT OR IGNORE INTO team_stats
SELECT
h_name,
game_id,
1 AS home,
h_league,
h_score,
h_line_score,
h_at_bats,
h_hits,
h_doubles,
h_triples,
h_homeruns,
h_rbi,
h_sacrifice_hits,
h_sacrifice_flies,
h_hit_by_pitch,
h_walks,
h_intentional_walks,
h_strikeouts,
h_stolen_bases,
h_caught_stealing,
h_grounded_into_double,
h_first_catcher_interference,
h_left_on_base,
h_pitchers_used,
h_individual_earned_runs,
h_team_earned_runs,
h_wild_pitches,
h_balks,
h_putouts,
h_assists,
h_errors,
h_passed_balls,
h_double_plays,
h_triple_plays
FROM game_log
UNION
SELECT
v_name,
game_id,
0 AS home,
v_league,
v_score,
v_line_score,
v_at_bats,
v_hits,
v_doubles,
v_triples,
v_homeruns,
v_rbi,
v_sacrifice_hits,
v_sacrifice_flies,
v_hit_by_pitch,
v_walks,
v_intentional_walks,
v_strikeouts,
v_stolen_bases,
v_caught_stealing,
v_grounded_into_double,
v_first_catcher_interference,
v_left_on_base,
v_pitchers_used,
v_individual_earned_runs,
v_team_earned_runs,
v_wild_pitches,
v_balks,
v_putouts,
v_assists,
v_errors,
v_passed_balls,
v_double_plays,
v_triple_plays
from game_log;'

dbExecute(conn, insert_7)
## [1] 343814
# View sample
view_team_stat <- 'SELECT team_id, game_id, home, league_id, score,line_score, at_bats
FROM team_stats
WHERE at_bats IS NOT NULL
LIMIT 5;'

dbGetQuery(conn, view_team_stat)
## Warning: Column `line_score`: mixed type, first seen values of type integer,
## coercing other values of type string
##   team_id      game_id home league_id score line_score at_bats
## 1     ANA 20000403ANA0    1        AL     2   10000001      35
## 2     ANA 20000404ANA0    1        AL     3       3000      36
## 3     ANA 20000405ANA0    1        AL    12   12610110      33
## 4     ANA 20000407ANA0    1        AL     7   30000310      32
## 5     ANA 20000408ANA0    1        AL     7   20000401      35

SQL Query to Drop the original tables

#Vector with table names
tables_names <- c('game_log', 'park_codes', 'team_codes', 'person_codes')


for (i in tables_names){
# Construct the query to drop the table
drop_query <- paste0('DROP TABLE ', i)

# Execute Query
dbExecute(conn, drop_query)

# Displays a confirmation message
cat("The table ", i, "has been dropped.\n")
}
## The table  game_log has been dropped.
## The table  park_codes has been dropped.
## The table  team_codes has been dropped.
## The table  person_codes has been dropped.